Add: initial prototype of Markdown in Doenet#329
Conversation
|
For the test-viewer, first you want to change the default for Then, the problem is that the prototype doesn't yet have many components implemented. It doesn't look like there is an Obvious, we don't want this behavior, but you could turn your markdown into one of the existing components to test it for now, such Or, I just made a PR to your branch adding an |
|
Also, for easier testing, you could use the test viewer inside the The current |
|
@dqnykamp, fantastic, thanks! That's very helpful -- know where to look for implemented tags, and how to run the viewer makes all the difference. With that, it looks like this is up and running. I noticed that What's the next step? That is, given we now have the ability to render text as Markdown in the prototype Doenet viewer, does this make sense as a future Doenet feature? @siefkenj, any thoughts on this? |
| if (node.type === "text" && visitInfo.parents.length === 1) { | ||
| const html = commonMarkRenderer.render( | ||
| commonMarkParser.parse(node.value), | ||
| ); |
There was a problem hiding this comment.
Very interesting strategy. It's certainly quick for now, but I think it would be better to convert the AST directly so we have full control over the nodes created (e.g, in code blocks, etc.)
There was a problem hiding this comment.
Do you mean that it's possible we'll be using Markdown not just with text nodes, but in other places (inside code blocks, for example)? If so, then your comment below makes a lot of sense -- define where Markdown belongs and where it doesn't.
|
I think the next step is to set up some tests so we can all agree on what should be converted to what :-) |
Before that, perhaps a bit of discussion? I assume the question is: which Doenet compnents' content should be rendered using Markdown? Which should not? For example, the content of an Looking at the Doenet component types, any text inside paragraph markup components (alert, aslist, attr, etc.) should be rendered using Markdown. The text inside the following sectional components should be Markdown rendered: activity, aside, caption, cell (or should we instead use markdown tables?), conclusion, definition, example, exercise, hint, introduction, li, note, objectives, ol p, problem, proof, question, section, solution, statement, theorem, title, topic, ul. No other components will be Markdown rendered. So, given text in the DAST, if the parent of the text is one of the markdown-enabled components, then that text will be Markdown rendered; otherwise, it won't. Thoughts? |
|
Didn't we discuss having markdown only work in the top-level components to start? |
That was the initial discussion and certainly the simplest approach, and (AFAIK) what this PR implements. However, I understood the following to imply this PR should cover more cases (see also my comment):
So, two questions:
|
|
Great! As you know, there's one simple test case already: it("converts Markdown text to HTML", () => {
let source: string;
let dast: ReturnType<typeof lezerToDast>;
source = `*hi*\n\n# there!`;
dast = lezerToDast(source);
expect(toXml(normalizeDocumentDast(dast))).toEqual(
"<document><p><em>hi</em></p><h1>there!</h1></document>",
);
});I'm assuming more tests would cover Markdown located in some other place in Doenet, but not cover the functioning of Markdown itself. Given that, what other values for |
|
Every markdown feature should be tested, ideally in separate tests, and a few that blend some features together. So: lists, bold, em, links, images, paragraphs (other markdown features I am forgetting?) I would imagine tests like, Then there should be some tests about interleaving formats. So, if a user does One key thing for the tests to nail down is that: markdown cannot produce invalid DoenetML. If there is a feature of markdown that doesn't exist in doenet, it should be converted to its nearest equivalent. |
|
I think it wouldn't be hard to come up with a small list of component types inside which we'd still apply markdown, such as |
|
I'm working on writing tests for every Markdown component. Following the spec, the first item (section 4.1) is a thematic break, aka Testing
---
More testingbecomes Testing More testing Doenet doesn't support this, however. What should happen in this case?
I'd prefer choice 1, followed by choice 3. It's likely I can walk the parse tree from Markdown to de-transform this. Other items that don't translate:
Translation notes:
|
|
I've frequently been asked for an
I'm not certain about line breaks. What's the use case where paragraphs wouldn't work? |
Sounds good. For now, I'll just pass the
Here's one place I think it's helpful. Paragraphs in the default style produce two newlines, whereas at line break produces just one.
|
|
I'm making good progress; I've just pushed updated code with a number of tests. I ran into one minor problem and one major problem:
Thoughts? |
|
More thoughts on this: parsing with Markdown first is a difficult and a bad idea. However, the current parsing idea need to change. A sketch:
For example, consider the following DoenetML: <p>This is a block component. This component does not contain Markdown.</p>
<m>x^2</m> is some math in a paragraph that **should** be interpreted as Markdown.
<choiceInput> (a block component, doesn't contain Markdown)
<choice>**Agree**</choice> (These can contain Markdown)
<choice>Disagree</choice>
</choiceInput>We have something like this for the parse tree: In the diagram, the dotted boxes indicate areas containing Markdown text to be processed as described above. |
I think we just count that syntax as invalid. Markdown also interprets bare links as links, so there is no need for the |
| visit(tree, (node, visitInfo) => { | ||
| // All text nodes that are children of the root node are interpreted as Markdown. | ||
| if (node.type === "text" && visitInfo.parents.length === 1) { | ||
| const parse_tree = commonMarkParser.parse(node.value); |
There was a problem hiding this comment.
I'd really like to work directly with parse_tree here instead of turning it into a string and reparsing it.
There was a problem hiding this comment.
Thanks for taking the time to look through this code. I appreciate the feedback!
I agree that transforming directly from the CommonMark parse tree to the DAST parse tree would be more elegant. However, I suggest postponing optimizations (such as this) for later, when we're confident this will be adopted. My thought process:
Current approach:
- Pro: currently works.
- Con: inefficient -- the parse tree is converted to text, then parsed as DAST, then CommonMark nodes are transformed to DAST.
This suggested approach:
- Pro: more efficient, since it skips parsing HTML to DAST.
- Con: requires more code, since items that don't currently need to be translated between HTML and DAST now need code added. Also need code to convert between the CommonMark parse tree and the DAST format.
Notes on this approach: the usage section of the CommonMark docs shows a handy walker function to walk the CommonMark parse tree, and the docs seems pretty good. Is there similar docs on the DAST structure, which would make this easier?
My conclusion:
In this end, this is an optimization, not a change in functionality. Therefore, I'd suggest this as work for later, when we're more confident that this is the right approach.
|
Let's focus on the simple situation for now where there is no mixing of XML and Markdown. Handling situations like |
Sounds good. To the best of my knowledge, the current test suit evaluates all Markdown features and either translates them to DoenetML or produces known errors (Markdown autolinks aren't supported; mixed XML and Markdown isn't supported). The next step AFAIK is to get feedback: is this a feature people need? Therefore, what's the best approach to gather some feedback? |
|
Thoughts on the approach to solving this:
Notes:
|
9cfe50a to
535d930
Compare
|
I worked on this over the weekend, implementing the ideas above. They don't work: Markdown shouldn't simply replace existing textual content, it should also create new top-level hierarchy, moving nodes as necessary into that hierarchy. For example, the paragraph This <m>x^2</m> is...should become <p>
This <m>x^2</m> is...
</p>meaning we move the existing I think the best approach is to run the Markdown parser first, then run the DAST parser/tree builder, so we can create this hierarchy before it's transformed into DAST. My sketch:
Thoughts? |
|
There is a You should be able to do a If you focus just on blocks of text, does the approach work? (e.g., excluding your example that has |
|
Pure blocks of text without XML work. Getting Markdown to work inside XML block tags (like Getting block Markdown (blockquote, paragraph, code blocks) which contains XML tags is hard, and probably needs the process I sketched out earlier. |
|
Yes, markdown with xml in it will be harder...
There is a remark playground here, so you can get a feel for their AST: https://remark.js.org/ |


Several tests fail; a quick inspection shows failures related to treating text as Markdown, so I'm ignoring these failures until we decide to move in this direction.
The test I added in
dast-basic.test.tsshows that text is correctly parsed as Markdown bynormalizeDocumentDast. However, the test-viewer application doesn't render the resulting DAST, even though the resulting DAST tree I print fromnormalizeDocumentDastseems correct when comparing<p><em>testing</em></p>and its Markdown equivalent,*testing*. Any ideas on why this DAST doesn't render?The resulting DAST:

This renders as:
